34 research outputs found

    Pattern discovery in sequence databases : algorithms and applications to DNA/protein classification

    Get PDF
    Sequence databases comprise sequence data, which are linear structural descriptions of many natural entities. Approximate pattern discovery in a sequence database can lead to important conclusions or prediction of new phenomena. Traditional database technology is not suitable for accomplishing the task, and new techniques need to be developed. In this dissertation, we propose several new techniques for discovering patterns in sequence databases. Our techniques incorporate pattern matching algorithms and novel heuristics for discovery and optimization. Experimental results of applying the techniques to both generated data and DNA/proteins show the effectiveness of the proposed techniques. We then develop several classifiers using our pattern discovery algorithms and a previously published fingerprint technique. When we apply the classifiers to classify DNA and protein sequences, they give information that is complementary to the best classifiers available today

    Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

    Get PDF
    BACKGROUND: The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. RESULTS: Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. CONCLUSION: Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes

    Uncovering mechanisms of transcriptional regulations by systematic mining of cis regulatory elements with gene expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Contrary to the traditional biology approach, where the expression patterns of a handful of genes are studied at a time, microarray experiments enable biologists to study the expression patterns of many genes simultaneously from gene expression profile data and decipher the underlying hidden biological mechanism from the observed gene expression changes. While the statistical significance of the gene expression data can be deduced by various methods, the biological interpretation of the data presents a challenge.</p> <p>Results</p> <p>A method, called CisTransMine, is proposed to help infer the underlying biological mechanisms for the observed gene expression changes in microarray experiments. Specifically, this method will predict potential cis-regulatory elements in promoter regions which could regulate gene expression changes. This approach builds on the MotifADE method published in 2004 and extends it with two modifications: up-regulated genes and down-regulated genes are tested separately and in addition, tests have been implemented to identify combinations of transcription factors that work synergistically. The method has been applied to a genome wide expression dataset intended to study myogenesis in a mouse C2C12 cell differentiation model. The results shown here both confirm the prior biological knowledge and facilitate the discovery of new biological insights.</p> <p>Conclusion</p> <p>The results validate that the CisTransMine approach is a robust method to uncover the hidden transcriptional regulatory mechanisms that can facilitate the discovery of mechanisms of transcriptional regulation.</p

    The capacity of target silencing by Drosophila

    No full text

    Xenopus

    No full text

    Development of comprehensive functional genomic screens to identify novel mediators of osteoarthritis.

    Get PDF
    OBJECTIVE: The aim of this study was to develop high-throughput assays for the analysis of major chondrocyte functions that are important in osteoarthritis (OA) pathogenesis and methods for high-level gene expression and analysis in primary human chondrocytes. METHODS: In the first approach, complementary DNA (cDNA) libraries were constructed from OA cartilage RNA and full-length clones were selected. These cDNAs were transferred into a retroviral vector using Gateway Technology. Full-length clones were over-expressed in human articular chondrocytes (HAC) by retroviral-mediated gene transfer. The induction of OA-associated markers, including aggrecanase-1 (Agg-1), matrix metalloproteinase-13 (MMP-13), inducible nitric oxide synthase (iNOS), cyclooxygenase-2 (COX-2), collagen IIA and collagen X was measured by quantitative real-time polymerase chain reaction (QPCR). Induction of a marker gene was verified by independent isolation of 2-3 clones per gene, re-transfection followed by QPCR as well as nucleotide sequencing. In the second approach, whole cDNA libraries were transduced into chondrocytes and screened for chondrocyte cluster formation in three-dimensional agarose cultures. RESULTS: Using green fluorescent protein (eGFP) as a marker gene, it was shown that the retroviral method has a transduction efficiency of >90%. A total of 40 verified hits were identified in the QPCR screen. The first set of 19 hits coordinately induced iNOS, COX-2, Agg-1 and MMP-13. The most potent of these genes were the tyrosine kinases Axl and Tyro-3, receptor interacting kinase-2 (RIPK2), tumor necrosis factor receptor 1A (TNFR1A), fibroblast growth factor (FGF) and its receptor FGFR, MUS81 endonuclease and Sentrin/SUMO-specific protease 3. The second set of seven hits induced both Agg-1 and MMP-13 but none of the other markers. Five of these seven genes regulate the phosphoinositide-3-kinase pathway. The most potently induced OA marker was iNOS. This marker was induced 20-500 fold by seven genes. Collagen IIA was also induced by seven genes, the most potent being transforming growth factor beta (TGFbeta)-stimulated protein TSC22, vascular endothelial growth factor (VEGF) and splicing factor 3a. This screening assay did not identify inducers of collagen X. The second chondrocyte cluster formation screen identified 14 verified hits. Most of the genes inducing cluster formation were kinases. Additional genes had not been previously known to regulate chondrocyte cluster formation or any other chondrocyte function. CONCLUSIONS: The methods developed in this study can be applied to screen for genes capable of inducing an OA-like phenotype in chondrocytes on a genome-wide scale and identify novel mediators of OA pathogenesis. Thus, coordinated functional genomic approaches can be used to delineate key genes and pathways activated in complex human diseases such as OA
    corecore